File Import And Processing
This document explains the file import and contact extraction system used to process CSV, Excel (.xlsx/.xls), and text files for bulk messaging applications. It covers automatic file type detection, fallback parsing strategies, column detection for phone numbers and names, robust error handling, supported formats and naming conventions, performance considerations for large files, and security measures for file processing.
The system spans two primary environments:
Electron desktop application (frontend) with IPC handlers for file operations
Python backend service for robust contact extraction and validation
Diagram sources
Section sources
File type detection and routing: Electron detects file extension and routes to appropriate parsers.
CSV/Excel extraction: Python backend uses pandas to parse structured spreadsheets and applies keyword-based column detection.
Text file parsing: Python backend splits lines and attempts to extract phone numbers and names using regex heuristics.
Manual number parsing: Python backend parses free-form text with name-number pairs and standalone numbers.
Phone number cleaning/validation: Standardized normalization and length validation across all components.
Error handling: Graceful fallbacks and safe defaults when parsing fails.
Section sources
The system integrates Electron IPC with a Python backend Flask service for robust file processing.
Diagram sources
File Type Detection and Routing#
Electron detects file extension and routes to either:
Python backend via HTTP POST for CSV/TXT/XLSX/XLS
Native CSV parser for CSV/TXT within Electron for manual parsing
Unsupported formats (e.g., XLSX/XLS) currently return empty results in the Electron-managed import flow.
Section sources
CSV Extraction Pipeline#
Uses pandas to read CSV files.
Keyword-based column detection:
Phone columns: look for keywords such as phone, number, mobile, cell, tel.
Name columns: look for keywords such as name, contact, person.
Fallback strategy:
If pandas parsing fails, falls back to manual CSV reader with UTF-8 encoding.
Defaults to first column as phone and second as name if no matches found.
by keyword matching"] DetectCols --> UseCols["Select phone_col, name_col"] UseCols --> IterateRows["Iterate rows"] IterateRows --> CleanPhone["clean_phone_number()"] CleanPhone --> ValidPhone{"Valid phone?"} ValidPhone --> |Yes| GetName["Get name from name_col or None"] GetName --> BuildContact["Build contact {number,name}"] BuildContact --> AddToList["Append to contacts[]"] ValidPhone --> |No| NextRow["Next row"] NextRow --> IterateRows PandasOK --> |No| Fallback["Open with UTF-8 and csv.reader"] Fallback --> IterateCSV["Iterate rows"] IterateCSV --> CleanPhoneFallback["clean_phone_number()"] CleanPhoneFallback --> ValidPhoneFallback{"Valid phone?"} ValidPhoneFallback --> |Yes| GetNameFallback["Get name from row[1] or None"] GetNameFallback --> BuildContactFallback["Build contact {number,name}"] BuildContactFallback --> AddToList ValidPhoneFallback --> |No| NextRowFallback["Next row"] NextRowFallback --> IterateCSV AddToList --> Done(["Return contacts[]"])
Diagram sources
Section sources
Excel (.xlsx/.xls) Extraction Pipeline#
Uses pandas to read Excel files.
Applies identical keyword-based column detection as CSV.
Fallback strategy:
If pandas parsing fails, returns empty contacts silently.
Diagram sources
Section sources
Text File Parsing Pipeline#
Reads file as UTF-8 text.
Splits lines and attempts to split by common separators (comma, semicolon, tab, pipe).
Heuristic to detect phone numbers:
Look for segments containing digits and common separators (+, -, (), spaces).
If no clear split, regex match for phone-like strings in the entire line.
Name extraction:
First non-empty segment that does not look like a phone number.
Cleans and validates phone numbers using the shared validator.
Diagram sources
Section sources
Manual Number Parsing (Free-form Text)#
Parses free-form text entries with optional name-number pairs.
Supports separators: newline, comma, semicolon.
Tries to split by colon or dash/pipe to separate name and number.
Falls back to treating the entire entry as a phone number.
Validates and formats numbers using the shared validator.
Diagram sources
Section sources
Phone Number Cleaning and Validation#
Removes separators and non-digit characters except plus sign.
Normalizes leading zeros and adds country prefix when applicable.
Validates digit count to ensure realistic phone lengths.
Used consistently across CSV, Excel, TXT, and manual parsing.
Diagram sources
Section sources
Upload Validation and Security Measures#
Electron file import dialog restricts accepted file types for WhatsApp contacts.
Python backend validates file extensions and rejects unsupported types.
File uploads are saved temporarily and removed after processing to prevent accumulation.
Maximum content length enforced to limit upload size.
Secure filename handling prevents path traversal.
Section sources
The system relies on:
Python libraries: Flask, pandas, openpyxl, xlrd, werkzeug
Electron IPC for secure communication between frontend and main process
Pyodide for running Python code in the browser for manual number parsing
Diagram sources
Section sources
CSV fallback parsing reads files line-by-line, which is memory-efficient for large files.
Excel parsing uses pandas; for very large Excel files, consider chunked reading or limiting rows.
Phone number validation runs per row; keep regex patterns minimal and reuse compiled patterns if scaling.
File uploads are removed after processing to avoid disk pressure.
Electron-managed CSV/TXT import avoids heavy backend calls for small files processed in the renderer.
[No sources needed since this section provides general guidance]
Common issues and resolutions:
Unsupported file type: Ensure file extension is CSV, TXT, XLSX, or XLS. XLSX/XLS are not supported in the Electron-managed import flow.
Encoding errors: Files should be UTF-8 encoded. The system attempts UTF-8 decoding; non-UTF-8 files may fail.
Malformed data: Phone numbers must contain 7–15 digits after cleaning. Entries with invalid phone formats are skipped.
Large files: CSV fallback parsing is designed for streaming; Excel files may require optimization or smaller chunks.
Column naming: Use keywords like phone, number, mobile, cell, tel for phone columns; name, contact, person for names.
Section sources
The file import and contact extraction system provides a robust, multi-format pipeline with automatic detection and fallback strategies. It supports CSV, Excel, and text files, with keyword-based column detection for phone numbers and names. Phone number cleaning and validation ensure consistent formats, while error handling and security measures protect against malformed inputs and unsupported formats. For large files, streaming and fallback parsing minimize memory usage and improve reliability.